Learning Rules from Incomplete Examples via a Probabilistic Mention Model
نویسندگان
چکیده
We consider the problem of learning rules from natural language text sources. These sources, such as news articles, journal articles, and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces an explicit probabilistic mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can successfully learn accurate rules and apply them to new texts to make correct inferences.
منابع مشابه
Mention Model for Learning Rules from Incomplete Examples
Introduction. We are motivated by the problem of learning rules from naturally available data sources such as natural language texts, web pages, and medical databases. At first, learning rules from natural sources like the web seems to consist of extracting specific facts followed by data mining of rules. Unfortunately, however, there are two major obstacles to fully realizing the dream of unli...
متن کاملLearning Rules from Incomplete Examples via Implicit Mention Models
We study the problem of learning general rules from concrete facts extracted from natural data sources such as the newspaper stories and medical histories. Natural data sources present two challenges to automated learning, namely, radical incompleteness and systematic bias. In this paper, we propose an approach that combines simultaneous learning of multiple predictive rules with differential s...
متن کاملLearning Rules from Incomplete Examples: A Pragmatic Approach
In this paper, we consider the problem of inductively learning rules from specific facts extracted from texts. This problem is challenging due to two reasons. First, natural texts are radically incomplete since there are always too many facts to mention. Second, natural texts are systematically biased towards novelty and surprise, which presents an unrepresentative sample to the learner. Our so...
متن کاملInverting Grice's Maxims to Learn Rules from Natural Language Extractions
We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We...
متن کاملحمایت از معلولین در حقوق بین الملل
Objective: Disable people need special legal attention. In this regard, special rules have been gradually developed by domestic and international law. Convention for the Protection of the Disabled Disability (2006) in the international community and the Iranian Act of comprehensive protection of the disable people (1383) in a national community are examples of above mentioned legal develop...
متن کامل